Skip to content

Conversation

@ngc92
Copy link
Collaborator

@ngc92 ngc92 commented Nov 1, 2025

No description provided.

@github-actions
Copy link

github-actions bot commented Nov 1, 2025

Coverage report

Click to see where and how coverage changed

FileStatementsMissingCoverageCoverage
(new stmts)
Lines missing
  src/libkernelbot
  report.py 73, 350
Project Total  

This report was generated by python-coverage-comment-action

@ngc92 ngc92 force-pushed the ncu-profile branch 3 times, most recently from 248c411 to ce0c6ec Compare November 1, 2025 19:34
small code improvements
@ngc92 ngc92 force-pushed the ncu-profile branch 2 times, most recently from ae72216 to 4928fc2 Compare November 9, 2025 13:39
@ngc92 ngc92 force-pushed the ncu-profile branch 3 times, most recently from de51bbf to 779ee7f Compare November 9, 2025 14:22
@ngc92 ngc92 force-pushed the ncu-profile branch 4 times, most recently from 16b4d0c to d8d7145 Compare November 9, 2025 16:22
@S1ro1 S1ro1 marked this pull request as ready for review November 10, 2025 19:22
Copilot AI review requested due to automatic review settings November 10, 2025 19:22
@S1ro1 S1ro1 merged commit c1973b6 into main Nov 10, 2025
5 of 7 checks passed
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This is a work-in-progress PR that adds NVIDIA Nsight Compute (NCU) profiling support to complement the existing ROCm profiling capabilities. The changes refactor the profiling infrastructure to support multiple profilers and enable per-benchmark profiling runs.

  • Adds NCU profiling implementation with filtered report generation
  • Refactors profiling to run each benchmark separately and embed trace data in results
  • Updates test infrastructure and workflows for new GPU runners

Reviewed Changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 14 comments.

Show a summary per file
File Description
src/libkernelbot/run_eval.py Adds NCU profiling function, refactors profile_program to use temporary directories, changes parameter order to use keyword-only arguments, and modifies run_evaluation to handle per-benchmark profiling
src/libkernelbot/report.py Adds File dataclass for attaching profile files to reports, updates report generation to handle multiple profile runs, and adds _shortname helper for filename generation
tests/test_report.py Updates tests to reflect new profile reporting structure with trace field and File attachments
examples/eval.py Adds _run_single_profile_ncu function for NCU-specific profiling and updates profile selection logic based on POPCORN_NCU environment variable
src/kernelbot/discord_utils.py Adds _send_file helper function and updates _send_split_log to add newlines properly and send messages silently
src/kernelbot/discord_reporter.py Adds File handling to display_report for uploading profile attachments
src/kernelbot/api/api_utils.py Adds check to block profile submissions via API and reformats code
src/runners/modal_runner.py Updates Python version from 3.12 to 3.13 in CUDA image
.github/workflows/nvidia_workflow.yml Updates runner to nvidia-docker-b200-8-x86-64, removes container configuration, and simplifies setup steps
.github/workflows/nvidia-arc-health.yml Updates runner and removes Python/PyTorch installation steps
src/libkernelbot/launchers/github.py Adds conditional check for profile-data artifact before setting download_url and reduces polling interval
scripts/ci_test_python.py Updates test to use python3 command and keyword argument for system parameter
scripts/ci_test_cuda.py Updates tests to use keyword argument for system parameter

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

if _handle_crash_report(report, prof_run):
return report

if prof_run.profile.trace is not None:
Copy link

Copilot AI Nov 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential AttributeError if prof_run.profile is None. The code checks prof_run.profile.trace is not None without first verifying that prof_run.profile itself is not None. Consider adding a null check:

if prof_run.profile is not None and prof_run.profile.trace is not None:
Suggested change
if prof_run.profile.trace is not None:
if prof_run.profile is not None and prof_run.profile.trace is not None:

Copilot uses AI. Check for mistakes.
Comment on lines +225 to +229
if submission_mode_enum == SubmissionMode.PROFILE:
raise HTTPException(
status_code=400,
detail="Profile submissions are not currently supported via API, use Discord instead.",
)
Copy link

Copilot AI Nov 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate check for SubmissionMode.PROFILE on lines 214 and 225. The first check already raises an exception if the mode is PROFILE, so the second check on line 225 is unreachable. Consider removing the duplicate check on lines 225-229.

Suggested change
if submission_mode_enum == SubmissionMode.PROFILE:
raise HTTPException(
status_code=400,
detail="Profile submissions are not currently supported via API, use Discord instead.",
)

Copilot uses AI. Check for mistakes.
Comment on lines 27 to 28
- name: Create input files
shell: bash
Copy link

Copilot AI Nov 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "Create input files" step runs apt-get commands without sudo and without a container context. The previous version used a container (image: nvidia/cuda:12.4.0-devel-ubuntu22.04), which provided a root environment. Without a container, these commands will likely fail on the self-hosted runner unless the runner is configured to run as root (which is a security risk). Consider either restoring the container setup or using sudo for the apt-get commands, or ensure jq is pre-installed on the runner.

Copilot uses AI. Check for mistakes.
def log_one(base_name):
spec = run.result.get(f"{base_name}.spec")

report: str = run.result.get(f"{base_name}.report")
Copy link

Copilot AI Nov 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential error if report is None. The code calls .encode() on report without checking if it exists first. If benchmark.{i}.report is not in the result dictionary, this will raise an AttributeError. Consider adding a check:

report: str = run.result.get(f"{base_name}.report")
if report is None:
    return
report = base64.b64decode(report.encode("utf-8"), b"+*").decode("utf-8")
Suggested change
report: str = run.result.get(f"{base_name}.report")
report: str = run.result.get(f"{base_name}.report")
if report is None:
return

Copilot uses AI. Check for mistakes.
"""
Profiles a single benchmark using ncu. Note: this does not
invoke NCU; instead, it is expected that eval is launched
under NCU, and this function will rurnthe kernel excactly
Copy link

Copilot AI Nov 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in the docstring: "rurnthe" should be "run the".

Suggested change
under NCU, and this function will rurnthe kernel excactly
under NCU, and this function will run the kernel excactly

Copilot uses AI. Check for mistakes.
Comment on lines +368 to +370
report.add_file(
f"profile-{_shortname(prof_run.run.result.get('benchmark.0.spec'))}.zip",
f"{prof_run.profile.profiler} report - " + prof_run.run.result.get("benchmark.0.spec"),
Copy link

Copilot AI Nov 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential error if benchmark.0.spec is not in the result dictionary. The get() method will return None if the key doesn't exist, which will cause _shortname() to fail with an AttributeError when trying to call .replace() on None. Consider adding a default value or null check:

spec = prof_run.run.result.get('benchmark.0.spec', 'unknown')
report.add_file(
    f"profile-{_shortname(spec)}.zip",
    f"{prof_run.profile.profiler} report - {spec}",
    base64.b64decode(prof_run.profile.trace),
)
Suggested change
report.add_file(
f"profile-{_shortname(prof_run.run.result.get('benchmark.0.spec'))}.zip",
f"{prof_run.profile.profiler} report - " + prof_run.run.result.get("benchmark.0.spec"),
spec = prof_run.run.result.get('benchmark.0.spec', 'unknown')
report.add_file(
f"profile-{_shortname(spec)}.zip",
f"{prof_run.profile.profiler} report - {spec}",

Copilot uses AI. Check for mistakes.

if prof_run.profile.trace is not None:
report.add_log(
f"Profiling {prof_run.run.result.get('benchmark.0.spec')}",
Copy link

Copilot AI Nov 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential error if benchmark.0.spec is not in the result dictionary. The get() method will return None if the key doesn't exist, which will be interpolated as the string "None" in the log header. Consider adding a default value:

f"Profiling {prof_run.run.result.get('benchmark.0.spec', 'unknown benchmark')}"
Suggested change
f"Profiling {prof_run.run.result.get('benchmark.0.spec')}",
f"Profiling {prof_run.run.result.get('benchmark.0.spec', 'unknown benchmark')}",

Copilot uses AI. Check for mistakes.
with profile(activities=[ProfilerActivity.CPU, ProfilerActivity.CUDA]) as prof:
with nvtx_range("custom_kernel"):
submission_output = custom_kernel(_clone_data(data, 0))
submission_output = custom_kernel(cloned)
Copy link

Copilot AI Nov 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable submission_output is not used.

Copilot uses AI. Check for mistakes.

cloned = _clone_data(data, 0)
with nvtx_range("custom_kernel"):
submission_output = custom_kernel(cloned)
Copy link

Copilot AI Nov 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable submission_output is not used.

Suggested change
submission_output = custom_kernel(cloned)
custom_kernel(cloned)

Copilot uses AI. Check for mistakes.
] + call

run_result = run_program(
call, seed=seed, timeout=timeout, multi_gpu=multi_gpu, extra_env={"POPCORN_NCU": "1"}
Copy link

Copilot AI Nov 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'except' clause does nothing but pass and there is no explanatory comment.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants